Dataset: marketing_data.csv

Domain: Marketing, Retail

Q. 1 Import necessary libraries and load the dataset and display random 5 samples. Check the info of the data and write your findings.

Q. 2 Check the following using an appropriate method and write your findings.

i) Check how spread out or varied your data set is.

ii) Check where the middle 50% of your data lies.

iii) Check boundaries for the lower, middle and upper quarters of data

Q.3 Check for any missing values in the dataset and handle them using an appropriate method.

Only "Income" col has 24 null entries out of total 2240 entries.

Q.4 Check for any presence of special characters in any variables. If present, clean/replace and change the datatype of the variable if required.

  1. $ and , characters are in Income.

Q.5 The Marketing Manager wants to know the 'Age' of the customers. Extract the feature age from the given dataset and display the statistical summary of the age?

  1. Customers age lies between 26 to 129.

Q.6 The Marketing manager wants to understand the total amount spent on various products so that we can find what percentage of the amount is spent on which product.

a. Find out the total amount spent by a customer.

We have 6 products for customers.

  1. MntWines
  2. MntFruits
  3. MntMeatProducts
  4. MntFishProducts
  5. MntSweetProducts
  6. MntGoldProds

b. Display the Percentage of the amount spent on Wines and other products.

Q.7 Being an Analyst understands the total number of purchases made through different channels which can help find the percentage these channels contribute.

a. Find out the total purchases done by a customer through different channels.

List of different purchase channels

  1. NumDealsPurchases
  2. NumWebPurchases
  3. NumCatalogPurchases
  4. NumStorePurchases

b. Display the percentage of the store and other channels’ contribution to the total purchases.

Q.8 The marketing manager wants to understand the performance of different marketing campaigns. Find out which marketing campaign is most successful? Use suitable graphs for visualization. (Hint:- use features like AcceptedCmp for campaign information).

Findings

  1. 0 shows that customer accepted the offer.
  2. 1 shows that customer did not accept the offer.
  3. the count of 0 in each campaign is very large that shows each campaign is great failure.
  4. Above graph with count shows that out of all campaigns Last campaign ('Response') is most successful.

Q.9 The marketing manager wants to understand which products are performing the best and which are performing the least in terms of revenue. Being an analyst, analyse the data and plot a suitable graph to display a report on revenue generated by different products.

Findings

  1. Total revenue generated by all 6 products is 1345279
  2. Most performing products is MntWines product with 676083 (50.26% of total).
  3. Least performing products is MntFruits product with 58405 (4.34% of total).

Q.10 The team wants to understand if there’s any pattern between the age of customers and the last campaign acceptance rate. Plot a suitable graph to visualize the distribution of the age with respect to customers who accepted the last campaign.

Findings

  1. Corr value is -0.02 that shows week correlation found between Age and Response (Last Campaign)
  2. Customers who accpet the last campaign mostly has age 50 and around the 50.
  3. Last campaign acceptance rate is little right skewed when age started increasing after 50.
  4. accpetance rate is quickly started increasing till 50 but slowly started decreasing after 50.
  5. No specific pattern found

Q.11 The Chief Marketing specialist wants to visually see which Country has the most number of customers who accepted the last campaign. What is your approach?

Approach:

  1. Filter the dataframe for only last campaign offer accepted customers.
  2. Show country wise data as bar plot with count for each country.

Result: Country SP has 176 customers who accepted the last campaign offer.

Q.12 Analyse visually and write your inferences about the relationship between the following:

i) Total amount spent Vs Dependents. (Dependents=['Kidhome']+['Teenhome'])

Inferences:

  1. We can see the relationship between Total_Amount and Dependents using heatmap.
  2. Heatmap shows negative correlation between Total_Amount and Dependents.
  3. As per the above analysis. customers who have less dependents spends more.
  4. As per the above visualization if no. of dependents increases half of the time total amount spent (Total_Amount) decreases.

ii) Total Purchases Vs Dependents.

Inferences

  1. Heatmap shows week negative correaltion.
  2. 1/4th time if Dependents increases, Total_Purchase decreases and same for vice-versa.

Q.13 Perform Correlation Analysis and write your key inferences. (Hint:- visualise using an appropriate plot)

Inferences Postive correlation found in between

  1. Income and AcceptedCmp1 is 0.28.
  2. Income and AcceptedCmp5 is 0.34.
  3. Income and Total_Amount is 0.67. Strong
  4. Income and Total_Purchase is 0.57. Strong
  5. NumWebVisitsMonth and Dependents is 0.42.
  6. AcceptedCmp3 and Response is 0.25.
  7. AcceptedCmp4 and AcceptedCmp5 is 0.31.
  8. AcceptedCmp4 and AcceptedCmp1 is 0.24.
  9. AcceptedCmp4 and AcceptedCmp2 is 0.3.
  10. AcceptedCmp5 and Total_Amount is 0.47.
  11. Total_Purchase and Total_Amount is 0.76. Strong

Negative correlation found in between

  1. NumWebVisitsMonth and Income is -0.55. Strong
  2. NumWebVisitsMonth and Total_Amount is -0.5. Strong
  3. Dependents and Total_Amount is -0.5. Strong

Note:

  1. Positive value shows increasing one variable increases another variable.
  2. Negative value shows increasing one variable decreases another variable.

Q.14 Understand the Education background of the customers who complained in the last 2 years. State the Education background of the customers who have registered the most number of complaints. (Hint:- you can use appropriate)

Findings

  1. We have total 5 educations for all customers.
  2. Customers with education Basic never complained.
  3. Customers who have Graduation education background complained most.

Q.15 Use features 'Total_amount_spent', 'MntFruits', 'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts' and, 'MntGoldProds' in x-axis and y-axis and plot the following plots.

i) Plot a pairplot with hue as Response.

Findings in terms of response.

  1. MntWines and Total_Amount has partial positive linear relatioship.
  2. MntMeatProducts and Total_Amount has partial positive linear relatioship.
  3. Plot shows most of the customers with response 0 which means campaign was not accepted by the most of customers.

ii) Plot a pairplot with hue as Education.

Findings in terms of Education.

  1. MntWines and Total_Amount has partial positive linear relatioship.
  2. MntMeatProducts and Total_Amount has partial positive linear relatioship.
  3. Plot shows most of the customers with education Graduation and PhD contributed most.

iii) Plot a pairplot with hue as Marital Status and write your key observations

Findings in terms of Marital Status.

  1. MntWines and Total_Amount has partial positive linear relatioship.
  2. MntMeatProducts and Total_Amount has partial positive linear relatioship.
  3. No specfic findings found.